Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year
- Medium
- Type
- BLLDB-Access:
  - free (85)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5

Hits 1 – 20 of 85

1	Towards Unsupervised Content Disentanglement in Sentence Representations via Syntactic Roles
	Felhi, Ghazi; Le Roux, Joseph; Seddah, Djamé
	In: CtrlGen: Controllable Generative Modeling in Language and Vision ; https://hal.inria.fr/hal-03540084 ; CtrlGen: Controllable Generative Modeling in Language and Vision, Jan 2022, virtual, France (2022)
	BASE
	Show details

2	Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
	Riabi, Arij; Sagot, Benoît; Seddah, Djamé
	In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
	Abstract: International audience ; Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In this low-resource scenario with data displaying a high level of variability, we compare the downstream performance of a character-based language model on part-of-speech tagging and dependency parsing to that of monolingual and multilingual models. We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models. Confirming these results a on much larger data set of noisy French user-generated content, we argue that such character-based language models can be an asset for NLP in low-resource and high language variability set-tings.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
	URL: https://hal.inria.fr/hal-03527328
	BASE
	Hide details

3	First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
	Muller, Benjamin; Elazar, Yanai; Sagot, Benoît...
	In: https://hal.inria.fr/hal-03161685 ; 2021 (2021)
	BASE
	Show details

4	Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi
	Muller, Benjamin; Sagot, Benoît; Seddah, Djamé
	In: https://hal.inria.fr/hal-03161677 ; 2021 (2021)
	BASE
	Show details

5	First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
	Muller, Benjamin; Elazar, Yanai; Sagot, Benoît...
	In: EACL 2021 - The 16th Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03239087 ; EACL 2021 - The 16th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2021, Kyiv / Virtual, Ukraine ; https://2021.eacl.org/ (2021)
	BASE
	Show details

6	When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
	Muller, Benjamin; Anastasopoulos, Antonios; Sagot, Benoît...
	In: NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.inria.fr/hal-03251105 ; NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2021, Mexico City, Mexico (2021)
	BASE
	Show details

7	PAGnol: An Extra-Large French Generative Model
	Launay, Julien; Tommasone, Giuseppe Luca; Pannier, Baptiste...
	In: https://hal.inria.fr/hal-03540159 ; [Research Report] LightON. 2021 (2021)
	BASE
	Show details

8	Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
	Riabi, Arij; Scialom, Thomas; Keraron, Rachel...
	In: https://hal.inria.fr/hal-03109187 ; 2021 (2021)
	BASE
	Show details

9	Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models
	Núñez, José Carlos Rosales; Wisniewski, Guillaume; Seddah, Djamé
	In: W-NUT 2021 - 7th Workshop on Noisy User-generated Text (colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03540174 ; W-NUT 2021 - 7th Workshop on Noisy User-generated Text (colocated with EMNLP 2021), Association for computational linguistics, Nov 2021, Punta Cana, Dominican Republic (2021)
	BASE
	Show details

10	Understanding the Impact of UGC Specificities on Translation Quality
	Rosales Nunez, José Carlos; Seddah, Djamé; Wisniewski, Guillaume
	In: W-NUT 2021 - Seventh Workshop on Noisy User-generated Text (colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03540175 ; W-NUT 2021 - Seventh Workshop on Noisy User-generated Text (colocated with EMNLP 2021), association for computational linguistics, Nov 2021, Punta Cana, Dominican Republic (2021)
	BASE
	Show details

11	Challenging the Semi-Supervised VAE Framework for Text Classification
	Felhi, Ghazi; Roux, Joseph Le; Seddah, Djamé
	In: Second Workshop on Insights from Negative Results in NLP (colocated with EMNLP) ; https://hal.inria.fr/hal-03540081 ; Second Workshop on Insights from Negative Results in NLP (colocated with EMNLP), Nov 2021, Punta Cana, Dominican Republic ; https://insights-workshop.github.io/2021/ (2021)
	BASE
	Show details

12	Deep Sequoia corpus - PARSEME-FR corpus - FrSemCor
	Barque, Lucie; Candito, Marie; Constant, Matthieu. - : ANR, 2021
	BASE
	Show details

13	IWPT 2021 Shared Task Data and System Outputs
	Zeman, Daniel; Bouma, Gosse; Seddah, Djamé. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

14	Universal Dependencies 2.9
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

15	Universal Dependencies 2.8.1
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

16	Universal Dependencies 2.8
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

17	Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? ...
	Riabi, Arij; Sagot, Benoît; Seddah, Djamé. - : arXiv, 2021
	BASE
	Show details

18	First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT ...
	Muller, Benjamin; Elazar, Yanai; Sagot, Benoît. - : arXiv, 2021
	BASE
	Show details

19	Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; ., Jacopo; Keraron, Rachel. - : Underline Science Inc., 2021
	BASE
	Show details

20	Treebanking User-Generated Content: A Proposal for a Unified Representation in Universal Dependencies
	Sanguinetti, Manuela [Verfasser]; Bosco, Cristina [Verfasser]; Cassidy, Lauren [Verfasser]. - Mannheim : Leibniz-Institut für Deutsche Sprache (IDS), Bibliothek, 2020
	DNB Subject Category Language
	Show details

Page: 1 2 3 4 5

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern